# **Lab Assignment 6**

Matrix multiplication using Systolic Arrays

(You are allowed to work in groups of 2 for this Lab)

#### Objective

To perform matrix multiplication using Systolic Array structures.

### **Description**

A systolic array is a network of processors that rhythmically compute and pass data through the system.



Data item is not only used when it is input but also reused as it moved through the pipelines in the array.

#### **Design**

In this lab, you are required to implement a 3 X 3 Matrix multiplier using systolic array structure built with MAC (Multiple and Accumulate) units.

The MAC operation modifies an accumulator register A with the below operation

You are expected to use a specific floating point number representation to implement the MAC unit. Each matrix element should be of 8-bit width.

The 8-bit representation should occupy **1** bit for the sign, **3** bits for the exponent and **4** bits for the mantissa (fraction) parts with a leading **1**. Floating point number representation is described in Chapter 7 of the textbook. You are advised to read through the section before starting with this lab.

The matrix multiplier operation is illustrated in the Figure 2.



Figure 2

To multiply two 3 X 3 matrices, there are  $3^3$  multiplications that need to be performed. It is possible to use  $N^2$  MAC units enabling completion of the matrix multiplication in N steps. For example, consider the fabric with 9 MACs as shown in Figure 2. In order to multiply the two 3 X 3 matrices, the matrix elements are staged as shown in the figure. Figure 2(a) indicates the condition at time 1. In the first cycle,  $a_{0,0}$  and  $b_{0,0}$  propagate into the top left MAC and get

multiplied. In the next cycle,  $a_{0,0}$  moves to the right and  $b_{0,0}$  moves to the bottom direction. Two more MACs are involved in cycle 2. Figure 2(b) indicates the location of the operands in the  $3^{rd}$  cycle. In the  $7^{th}$  cycle, all multiplications and additions are completed, and each of the answers are residing in the N MACs. Please note that your outputs signals are also expected to be of 8-bit width. Please take care of this truncation operation in your design.

## **Submission Details**

- Verilog files of all modules
- Test bench in Verilog
- We expect a structural modeling of MAC unit (using for loops for matrix multiplication is not allowed).

## **Checkout Details**

You are expected to only simulate the Matrix Multiplier structure in Vivado/Modelsim.